Rfc for collaborative pinsets #467

hsanjuan · 2018-06-15T14:11:55Z

This should give an idea of the general plan...

License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>

coveralls · 2018-06-15T14:22:48Z

Coverage decreased (-2.8%) to 65.277% when pulling 7737e7e on rfc/collaborative-pinsets into 6e760a7 on master.

lanzafame · 2018-06-19T06:53:25Z

docs/collaborative-pinsets-rfc.md

+
+First, we need to address the **scalability of the shared state** and updates to it. Using any of the many blockchains that support custom data payloads (like ethereum) to maintain the shared state addresses the scalability problem for the maintance of the shared state. Essentially, blockchains are scalable a consensus mechanism to maintain a shared state which grows stronger with the number of peers participating in it.
+
+Secondly, we need to address the **scalability problem for inter-peer communications**: for example, sending metrics so that pins can be allocated, or retrieving the status of an item over a very large number of peers will be a problem. In a `pin everywhere` scenario though, where allocations (thus metrics) are not needed, this becomes much smaller. All in all, we should avoid that all peers connect to a single trusted peer (or connect to all other peers). Ideally, peers would be connected only to a subset or would be able to join the cluster by just `bootstrapping` to any other peer, without necessarily keeping permanent connections active to the trusted peers.


Ideally, peers would be connected only to a subset...

Composite clusters would be a good solution to creating some form of hierarchy, whether it be based on grouping a few smaller cluster peers together to make them appear as a single bigger peer, or geographical proximity to each other, I am not sure yet, but however they are grouped it should reduce the communication overhead of the top level cluster peers.

Yes, in fact a composite cluster topology where the subclusters are collaborative clusters with replication factor -1 would actually be the easiest way to proceed and solves a bunch of the problems (allocation, peerset management etc). It adds some management overhead and single points of failure (the trusted nodes of each subcluster). The latter can be probably addressed with a load balancer.

In that case, if new random peers want to join, they would need to join one of the subclusters (preferably the one with less storage available), or a new subcluster would need to be created, depending on how space/wanted replication factor is managed. So more management overhead.

lanzafame · 2018-06-19T09:31:48Z

docs/collaborative-pinsets-rfc.md

+
+In a scenario where peers come and go and come back, this strategy feels suboptimal (although it would work on principle). We should probably work on an allocator component which can efficiently track and handle allocations as peers come and go. For example, if the minimum allocation factor cannot be reached, cluster should still pin the items and track them as underpinned and, as new peers join, it should allocate underpinned items to them. As peers go away, cluster should efficiently identify which pins need re-allocation.
+
+Perhaps the whole allocation format should be re-thought, allowing each of the peers to detect suballocated items and track them, and then informing the allocator that they are going to worry about it. This is, again, a difficult problem which Filecoin has solved properly by providing a market where peers can offer to store content.


Could you please clarify suballocated items? I just don't understand what it means here.

lanzafame · 2018-06-19T09:35:52Z

docs/collaborative-pinsets-rfc.md

+
+1. Run the go-ipfs daemon
+2. `ipfs-cluster-service init --template /ipns/Qmxxx`
+2. `ipfs-cluster-service daemon`


License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>

lanzafame · 2018-06-22T05:27:09Z

Another aspect of collab clusters that occurred to me yesterday was how to handle different configurations between the collaborating clusters. I was skimming through the Accrual Failure Detector paper and it mentions heartbeat failure detectors and the requirements on configuration. Which got me thinking that we are going to need a way of providing a configuration from the current cluster members to any join clusters. This suggests that any cluster that is joining a collab cluster that they don't 'own' would only have available a subset of the standard cluster configuration.

hsanjuan · 2018-06-25T09:03:11Z

Maybe we should allow bootstrapping with no config, and getting all params from the peer you connect to (upon auto-generating a key). Or generally have a system to auto-adjust configurations (or simply, get them from an IPFS hash)

ZenGround0

Really cool @hsanjuan. If you think it would be useful perhaps we could set up a call at some point to talk about the more blocky-chainy parts of this proposal?

ZenGround0 · 2018-06-26T18:06:31Z

docs/collaborative-pinsets-rfc.md

+
+
+## Prototype: ipfs blockchain and pubsub
+


All of this talk of trusted peersets and using blockchains for scalability is interesting to me. Could you clarify precisely why you want to use blockchain consensus among the trusted peersets?

Essentially, blockchains are scalable a consensus mechanism to maintain a shared state which grows stronger with the number of peers participating in it.

I think this should be articulated more precisely before jumping into this so that we really understand the advantage of using a blockchain over traditional BA among the trusted peers.

I think you might be saying that you want the trusted peerset to have the ability to go offline sporadically and not affect availability and consistency. If that's the case you should consider looking at sleepy consensus which is an application of Nakamoto consensus in the classical setting with a group of trusted peers designed exactly for this purpose.

In this paper Shi and Pass (of course) show that all traditional consensus mechanisms fail to achieve consistency and liveness under a reasonable notion of sporadic participation. However Sleepy Consensus is provably secure in this model! Implementing would likely be a big bad effort, but there should actually be some modules coming out of FC development that could be really useful for this.

@ZenGround0 interesting paper, thanks

ZenGround0 · 2018-06-26T18:13:01Z

docs/collaborative-pinsets-rfc.md

+
+There are several ways that a malicious peer might try to interfere with the activity of a collaborative cluster. In general, we should aim to have a working cluster when a majority of the peers in it are not malicious.
+
+* Having a **trusted peerset** makes it an easy target for DDoS attacks on the swarm.


If your trusted peerset is big enough such that DOSing the whole thing is expensive then DOS resistance is built in to an extent with secret leader elections that fit in well with some possible blockchain consensus mechanisms.

Blanket DOS resistance for the whole peerset seems like one of those things that no level of protocol design can fully protect against.

ZenGround0 · 2018-06-26T18:15:47Z

docs/collaborative-pinsets-rfc.md

+
+In a scenario where peers come and go and come back, this strategy feels suboptimal (although it would work on principle). We should probably work on an allocator component which can efficiently track and handle allocations as peers come and go. For example, if the minimum allocation factor cannot be reached, cluster should still pin the items and track them as underpinned and, as new peers join, it should allocate underpinned items to them. As peers go away, cluster should efficiently identify which pins need re-allocation.
+
+Perhaps the whole allocation format should be re-thought, allowing each of the peers to detect items which have not reached a high-enough replication factor and tracking them. Then these peers would inform the allocator that they are going to worry about the items. This is, again, a difficult problem which Filecoin has solved properly by providing a market where peers can offer to store content.


I think a useful differentiator between this project and filecoin is the heightened trust among participants. It's probably best to focus on uses that make use of higher trust assumptions to avoid FC overlap.

ZenGround0 · 2018-06-26T18:18:26Z

docs/collaborative-pinsets-rfc.md

+
+## Prototype: ipfs blockchain and pubsub
+
+The easiest way to approach the proposal above is with a prototype that uses IPFS to store a blockchain and pubsub to announce the current blockchain head. We can inform of new chain heads by publishing a new message using pubsub that points to the chain head's CID. These messages will be signed by one of the **trusted peers**. We can also use all peers to automatically backup the chain. In the case of conflicts, the longest chain will win.


To clarify, is this a blockchain that cluster peers are maintaining themselves block by block, (this is the impression I'm getting)? Or is this rather accessing a smart contract on an existing system. You did mention Ethereum so I thought I'd ask. Making the cluster consensus component out of a smart contract is an idea I've heard bouncing around and is maybe something else to consider if you are still early in the design phase.

ZenGround0 · 2018-06-26T18:20:46Z

docs/collaborative-pinsets-rfc.md

+Each chain block contains a sequential set of LogOp very much in the fashion of the current Raft log. The consensus layer, upon receiving a pubsub message with a new chain head:
+
+* Verifies it's signed by a trusted peer
+* If height > current -> processes the chain and so on


If I understand this correctly you are probably going to want to improve on this proposed consensus implementation. See the link to Sleepy Consensus above.

ZenGround0 · 2018-06-26T18:21:04Z

docs/collaborative-pinsets-rfc.md

+1. Run the go-ipfs daemon
+2. `ipfs-cluster-service init --template /ipns/Qmxxx`
+3. `ipfs-cluster-service daemon`
+


…mponent will look like License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>

kishansagathiya · 2019-01-04T04:15:19Z

docs/collaborative-pinsets-rfc.md

+Other components of cluster are independent from these two tasks, and provide functionality that will be useful in scenarios where the peer-set and pin-set maintenance works in a different manner:
+
+* A state component provides pinset representation and serialization
+* An ipfs connector component provides facilities for controlling the ipfs daemon (and the proxy)


proxy is a separate component now

kishansagathiya · 2019-02-04T03:59:02Z

docs/collaborative-pinsets-rfc.md

+
+### Security: Authentication and authorization for collaborative pinning
+
+In a collaborative pinset scenario, we probably want to have a limited set of peers which are able to modify the shared state and freely connect to API endpoints from any other peers. We call this **trusted peerset**. This implies that we need to find ways to:


@hsanjuan @lanzafame I just want to confirm my understanding,
A trusted peerset is defined for a given pinset, right? So, A cluster can have multiple trusted peersets based on pinsets.
One to one mapping between trusted peerset and pinset. Also a cluster can have multiple pinsets as well.

Or do you define a trusted peerset first and pinset is defined by pins held by that peerset. i.e, pinset is a function of peerset.

Second one seems more appropriate to me.

There is only one pinset per cluster (the shared state).

kishansagathiya · 2019-02-04T05:39:33Z

docs/collaborative-pinsets-rfc.md

+
+We can address the first problem by signing state upgrades, allowing any peer to authenticate them (as needed). Libp2p pubsub allows sending signed messages so receiving peers can obtain the public key and verify the signatures.
+
+For the second point, we have to consider the internal RPC API surface. Until now, it is assumed that all cluster peers can contact and use the RPC endpoints of all others. This is however very problematic as it would allow any peer for example to trigger ipfs pin/unpins anywhere. For this reason, we propose **authenticated RPC endpoints**, which will only allow a set of trusted peers to execute any of the RPC calls. This can be added as a feature of libp2p-gorpc, taking advantage of the authentication information provided by libp2p. Note, we will have to do UI adjustments so that non-trusted peers receive appropiate errors when they don't have rights to perform certain operations.


I see why we might need fine grained permissions for a pair of peer and method.

Peers which are part of the trusted peerset can call any RPC API to other members of the trusted peerset

But if a peer is not part of the trusted peerset you still want to let it call some RPC APIs, for example ID()

How I see it, we are trying to separate permissions for operations that just retrieves information and those that changes the state.

It seems that untrusted peers can't do much, since only thing they can do is to follow. What are the incentives of having a peer that just follows?

What are the incentives of having a peer that just follows?

The same for running IPFS in general. Helping back up content and so on..

hsanjuan · 2019-04-25T16:45:45Z

Closing this:

a) It's slightly outdated. We will have to write documentation on how actually this has happened.
b) This has happened (or almost).
c) I prefer to write it from scratch.

Rfc for collaborative pinsets

3c5fd8d

License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>

hsanjuan self-assigned this Jun 15, 2018

ghost added the status/in-progress In progress label Jun 15, 2018

lanzafame reviewed Jun 19, 2018

View reviewed changes

Address some comments from @lanzafame

ae37855

License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>

ZenGround0 reviewed Jun 26, 2018

View reviewed changes

Update to use CRDTs have a more concrete plan of how the consensus co…

7737e7e

…mponent will look like License: MIT Signed-off-by: Hector Sanjuan <code@hector.link>

ipfs-cluster deleted a comment from GitCop Jan 3, 2019

kishansagathiya reviewed Jan 4, 2019

View reviewed changes

meiqimichelle mentioned this pull request Jan 16, 2019

Update content on Cluster homepage to include latest product conversations ipfs-cluster/ipfs-cluster-website#56

Closed

kishansagathiya reviewed Feb 4, 2019

View reviewed changes

kishansagathiya mentioned this pull request Feb 4, 2019

Permissioned RPC endpoints libp2p/go-libp2p-gorpc#35

Closed

kishansagathiya reviewed Feb 4, 2019

View reviewed changes

hsanjuan closed this Apr 25, 2019

ghost removed the status/in-progress In progress label Apr 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rfc for collaborative pinsets #467

Rfc for collaborative pinsets #467

hsanjuan commented Jun 15, 2018

coveralls commented Jun 15, 2018 •

edited

Loading

lanzafame Jun 19, 2018

hsanjuan Jun 21, 2018

hsanjuan Jun 21, 2018

lanzafame Jun 19, 2018

hsanjuan Jun 21, 2018

lanzafame Jun 19, 2018

lanzafame commented Jun 22, 2018

hsanjuan commented Jun 25, 2018

ZenGround0 left a comment

ZenGround0 Jun 26, 2018

lanzafame Jun 27, 2018

ZenGround0 Jun 26, 2018

ZenGround0 Jun 26, 2018

ZenGround0 Jun 26, 2018

ZenGround0 Jun 26, 2018

ZenGround0 Jun 26, 2018

kishansagathiya Jan 4, 2019

kishansagathiya Feb 4, 2019 •

edited

Loading

kishansagathiya Feb 4, 2019

hsanjuan Feb 4, 2019

kishansagathiya Feb 4, 2019

kishansagathiya Feb 4, 2019 •

edited

Loading

hsanjuan Feb 4, 2019

hsanjuan commented Apr 25, 2019


		First, we need to address the scalability of the shared state and updates to it. Using any of the many blockchains that support custom data payloads (like ethereum) to maintain the shared state addresses the scalability problem for the maintance of the shared state. Essentially, blockchains are scalable a consensus mechanism to maintain a shared state which grows stronger with the number of peers participating in it.

		Secondly, we need to address the scalability problem for inter-peer communications: for example, sending metrics so that pins can be allocated, or retrieving the status of an item over a very large number of peers will be a problem. In a `pin everywhere` scenario though, where allocations (thus metrics) are not needed, this becomes much smaller. All in all, we should avoid that all peers connect to a single trusted peer (or connect to all other peers). Ideally, peers would be connected only to a subset or would be able to join the cluster by just `bootstrapping` to any other peer, without necessarily keeping permanent connections active to the trusted peers.


		In a scenario where peers come and go and come back, this strategy feels suboptimal (although it would work on principle). We should probably work on an allocator component which can efficiently track and handle allocations as peers come and go. For example, if the minimum allocation factor cannot be reached, cluster should still pin the items and track them as underpinned and, as new peers join, it should allocate underpinned items to them. As peers go away, cluster should efficiently identify which pins need re-allocation.

		Perhaps the whole allocation format should be re-thought, allowing each of the peers to detect suballocated items and track them, and then informing the allocator that they are going to worry about it. This is, again, a difficult problem which Filecoin has solved properly by providing a market where peers can offer to store content.


		There are several ways that a malicious peer might try to interfere with the activity of a collaborative cluster. In general, we should aim to have a working cluster when a majority of the peers in it are not malicious.

		* Having a trusted peerset makes it an easy target for DDoS attacks on the swarm.


		## Prototype: ipfs blockchain and pubsub

		The easiest way to approach the proposal above is with a prototype that uses IPFS to store a blockchain and pubsub to announce the current blockchain head. We can inform of new chain heads by publishing a new message using pubsub that points to the chain head's CID. These messages will be signed by one of the trusted peers. We can also use all peers to automatically backup the chain. In the case of conflicts, the longest chain will win.


		### Security: Authentication and authorization for collaborative pinning

		In a collaborative pinset scenario, we probably want to have a limited set of peers which are able to modify the shared state and freely connect to API endpoints from any other peers. We call this trusted peerset. This implies that we need to find ways to:


		We can address the first problem by signing state upgrades, allowing any peer to authenticate them (as needed). Libp2p pubsub allows sending signed messages so receiving peers can obtain the public key and verify the signatures.

		For the second point, we have to consider the internal RPC API surface. Until now, it is assumed that all cluster peers can contact and use the RPC endpoints of all others. This is however very problematic as it would allow any peer for example to trigger ipfs pin/unpins anywhere. For this reason, we propose authenticated RPC endpoints, which will only allow a set of trusted peers to execute any of the RPC calls. This can be added as a feature of libp2p-gorpc, taking advantage of the authentication information provided by libp2p. Note, we will have to do UI adjustments so that non-trusted peers receive appropiate errors when they don't have rights to perform certain operations.

Rfc for collaborative pinsets #467

Rfc for collaborative pinsets #467

Conversation

hsanjuan commented Jun 15, 2018

coveralls commented Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanzafame commented Jun 22, 2018

hsanjuan commented Jun 25, 2018

ZenGround0 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kishansagathiya Feb 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kishansagathiya Feb 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hsanjuan commented Apr 25, 2019

coveralls commented Jun 15, 2018 •

edited

Loading

kishansagathiya Feb 4, 2019 •

edited

Loading

kishansagathiya Feb 4, 2019 •

edited

Loading